from pipeline example described by Pierre-Luc Germain, course sta426 UZH

Loading necessary libraries

# set local path
local.path <- getwd()
setwd(local.path)

# use source() function to source the functions we want to execute
source("functions/functions_qc.R")
source("functions/functions_analysis.R")

Preprocessing & Clustering

Loading the data

We look at both the raw and unfiltered data from the cell ranger output, to see what yields the best results. For both data sets, we first ran them through our quality control function. Since the filtered data already has all the empty droplets removed, this step will be skipped for the filtered data, but not for the raw data. In the following section we will quickly go over what steps we took in our quality control function and how we made sure they were sensible.

The first step in our pipeline is to seperate the antibody-derived tag (ADT) data from the remaining mRNA counts and clean them up. Ideally we'd like to identify the cells which failed to capture the ADT's and remove them. This is normally done by removing cells with ADT counts less than or equal to half of the total number of tags. If we do that we loose a huge number of cells as shown in the summary output. We also created a plot showing the distribution of the number of detected ADT's and from there we observe that we have lots of cells with zero ADT counts and a smaller peak at around 50. From this plot we decided to change the threshold at which cells will be discarded to 25, to not throw away as many cells. But even with this change it seems like to many cells will be discarded because of low quality ADT data, which is why we decided to ignore this part of the data altogether.

In the next step, the empty droplets will be removed. As mentioned before this only happens for the raw data. For this we ran the funciton emptyDrops(), which tests if the counts of a cell are significantly different from the ambient mRNA pool. If this is not the case, then it is most likely an empty droplet. After looking at some summary statistics of the output of this function, we realized that there were non-significant cellbarcodes, which were bounded by the number of iterations. We therefore increased the number of iterations in emptyDrops() until this wasn't the case anymore (100'000). There's also a plot showing the histogram of the p-values given by emptyDrops(), which should ideally be approximatelly uniform especially at the beginning. This was luckily the case. It is interesting to note, that this part of the quality control pipeline is responsible for the most amount of discarded cells.

Next we removed the dead cells, e.g. the cells with high mitochondrial gene expression. The approach we used was the same as in class. Lastly we also removed cells, which have low library sizes and detection rates. As seen in class we used the perCellQCMetrics() and isOutlier() functions to do so. Before removing all these cells we looked at the plots showing the above metrics to have an idea what they look like. We also produced a plot showing the log-fold-change of the discarded vs. kept cells. This plot should help decide if an entire cell type is accidently discarded because of our quality control. If that would be the case, we would observe an enrichment of a certain cell type in the discarded group, which can be identified by an increased expression of the corresponding marker genes.

## data can be downloaded from : http://imlspenticton.uzh.ch/dump/files_for_levesque.tar

#TODO: can be deleted?
# patient1_HS.path <- file.path("data", "patient1_HS")
# 
# # read in filtered data
# fnameHS <- file.path(patient1_HS.path, "outs/filtered_feature_bc_matrix")

#pick your poison

#sces <- read_previous_data()
sces <- read_and_qc(4)
## [1] "###################################"
## [1] "Quality Control:"
## [1] "patient1_HS"
## [1] "Summary of discarded antibodies:"
##    Mode   FALSE    TRUE 
## logical    4387      22 
## [1] "Distribution of the number of detected antibodies across all cells (red line = threshold, below which cells would be discarded):"

## [1] "Summary of the cells identified as dead cells:"
##    Mode   FALSE    TRUE 
## logical    3472     937 
## [1] "Plot of cells, which will be discarded because they are identified as dead cells:"

## [1] "Summary of all cells that will be discarded and for what reasons:"
## DataFrame with 1 row and 4 columns
##     LibSize    NExprs  MitoProp     Total
##   <integer> <integer> <integer> <integer>
## 1         0       330       937       937
## [1] "Plots looking at some quality metrics calculated by perCellQCMetrics:"
## [1] "Check that we don't accidently get rid of small cell populations because of our cleaning of the data:"

## [1] "Summary of the doublets identified by the scDblFinder algorithm:"
## 
## singlet doublet 
##    3350     122 
## [1] "Percentage of removed cells:"
## [1] 24.01905
## [1] "###################################"
## [1] "Quality Control:"
## [1] "patient1_SCC"
## [1] "Summary of discarded antibodies:"
##    Mode   FALSE    TRUE 
## logical    3065       8 
## [1] "Distribution of the number of detected antibodies across all cells (red line = threshold, below which cells would be discarded):"

## [1] "Summary of the cells identified as dead cells:"
##    Mode   FALSE    TRUE 
## logical    2371     702 
## [1] "Plot of cells, which will be discarded because they are identified as dead cells:"

## [1] "Summary of all cells that will be discarded and for what reasons:"
## DataFrame with 1 row and 4 columns
##     LibSize    NExprs  MitoProp     Total
##   <integer> <integer> <integer> <integer>
## 1         0       410       702       702
## [1] "Plots looking at some quality metrics calculated by perCellQCMetrics:"
## [1] "Check that we don't accidently get rid of small cell populations because of our cleaning of the data:"

## [1] "Summary of the doublets identified by the scDblFinder algorithm:"
## 
## singlet doublet 
##    2316      55 
## [1] "Percentage of removed cells:"
## [1] 24.63391
## [1] "###################################"
## [1] "Quality Control:"
## [1] "patient2_HS"
## [1] "Summary of discarded antibodies:"
##    Mode   FALSE    TRUE 
## logical    8206      38 
## [1] "Distribution of the number of detected antibodies across all cells (red line = threshold, below which cells would be discarded):"

## [1] "Summary of the cells identified as dead cells:"
##    Mode   FALSE    TRUE 
## logical    6837    1407 
## [1] "Plot of cells, which will be discarded because they are identified as dead cells:"

## [1] "Summary of all cells that will be discarded and for what reasons:"
## DataFrame with 1 row and 4 columns
##     LibSize    NExprs  MitoProp     Total
##   <integer> <integer> <integer> <integer>
## 1         0       401      1407      1407
## [1] "Plots looking at some quality metrics calculated by perCellQCMetrics:"
## [1] "Check that we don't accidently get rid of small cell populations because of our cleaning of the data:"

## [1] "Summary of the doublets identified by the scDblFinder algorithm:"
## 
## singlet doublet 
##    6360     477 
## [1] "Percentage of removed cells:"
## [1] 22.85298
## [1] "###################################"
## [1] "Quality Control:"
## [1] "patient2_AK"
## [1] "Summary of discarded antibodies:"
##    Mode   FALSE    TRUE 
## logical    5028       8 
## [1] "Distribution of the number of detected antibodies across all cells (red line = threshold, below which cells would be discarded):"

## [1] "Summary of the cells identified as dead cells:"
##    Mode   FALSE    TRUE 
## logical    4089     947 
## [1] "Plot of cells, which will be discarded because they are identified as dead cells:"

## [1] "Summary of all cells that will be discarded and for what reasons:"
## DataFrame with 1 row and 4 columns
##     LibSize    NExprs  MitoProp     Total
##   <integer> <integer> <integer> <integer>
## 1         0       319       947       950
## [1] "Plots looking at some quality metrics calculated by perCellQCMetrics:"
## [1] "Check that we don't accidently get rid of small cell populations because of our cleaning of the data:"

## [1] "Summary of the doublets identified by the scDblFinder algorithm:"
## 
## singlet doublet 
##    3907     179 
## [1] "Percentage of removed cells:"
## [1] 22.41859
#sces <- read_10x()
sces <- lapply(sces, split_data)

# only select the gene expression data, drop the antibody capture
sce.patient1_HS <- sces[[1]]
sce.patient1_SCC <- sces[[2]]
sce.patient2_HS <- sces[[3]]
sce.patient2_AK <- sces[[4]]

# add name to sce for output graph
attr(sce.patient1_HS, "name") <- "Patient1 HS"
attr(sce.patient1_SCC, "name") <- "Patient1 SCC"
attr(sce.patient2_HS, "name") <- "Patient2 HS"
attr(sce.patient2_AK, "name") <- "Patient2 AK"

#convert rownames
# rownames(sce.patient1_HS) <- convert_rownames(sce.patient1_HS)
# rownames(sce.patient1_SCC) <- convert_rownames(sce.patient1_SCC)
# rownames(sce.patient2_HS) <- convert_rownames(sce.patient2_HS)
# rownames(sce.patient2_AK) <- convert_rownames(sce.patient2_AK)

#make sure every rowname has the format ENS_ID.Gene_Name
rownames(sce.patient1_HS) <- paste_rownames(sce.patient1_HS)
rownames(sce.patient1_SCC) <- paste_rownames(sce.patient1_SCC)
rownames(sce.patient2_HS) <- paste_rownames(sce.patient2_HS)
rownames(sce.patient2_AK) <- paste_rownames(sce.patient2_AK)

Normalization & reduction

The markers found in the literature D. Kim, Chung, and Kim (2020) Solé-Boldo (2020) were used to separate the different cell types and their subtypes within them. A few markers didn't appear at all in our data (WISP2 [Secretory-reticular fibroblast], SEPP1 [Macrophage], TYP1 [Melanocyte]) so the corresponding cell types were selected using their remaining markers present. The Pericyte and vSMC markers were removed because they clustered badly and polluted the Fibroblast population. Since those markers are known to correlate with specific cell types, their genes were added to the list of high variable genes used to produce a lower dimensional representation of the data, even when their expression didn't vary enough to be selected as highly variable.

Types Subtypes Markers
Keratinocyte DSC3 DSP LGALS7
basal KRT5 KRT14 COL17A1
suprabasal KRT1 KRT10
differentiated LOR SPINK5
ORS KRT6B
channel GJB2 GJB6 ATP1B1
sebaceous gland MGST1 FASN
sweat gland DCD AQP5
Fibroblast DCN LUM MMP2
secretory reticular SLPI CTHRC1 MFAP5 TSPAN8
proinflammatory CCL19 APOE CXCL2 CXCL3 EFEMP1
secretory papillary APCDD1 ID1 WIF1 COL18A1 PTGDS
mesenchymal ASPN POSTN GPC3 TNN SFRP1
Pericyte & vSMC ACTA2 TAGLN
pericyte RGS5
vSMC MYL9 TPM2 RERGL
Endothelial cell PECAM1 SELE CLDN5 VWF
lymphatic PROX1 LYVE1
Myeloid HLA-DRA FCER1G TYROBP AIF1
dendritic CD1C
langerhans CD207
macrophage CD68 RNASE1 ITGAX
Lymphocyte CD3D CD3E CD52 IL7R
Melanocyte DCT MLANA PMEL
Schwann cell CDH19 NGFR
Mitotic cell UBE2C PCNA
# run PCA

# using known genes
# genes not found in the data : WISP2" "SEPP1" "TYP1" 
# genes linked to pericytevSMC : "ACTA2","TAGLN"
# were removed from the following list
known_markers <- c("DSC3","DSP","LGALS7","KRT5","KRT14","COL17A1","KRT1","KRT10","LOR", "SPINK5","KRT6B","GJB2","GJB6","ATP1B1","MGST1","FASN","DCD","AQP5","DCN","LUM","MMP2","SLPI","CTHRC1","MFAP5","TSPAN8","CCL19","APOE","CXCL2","CXCL3","EFEMP1","APCDD1","ID1","WIF1","COL18A1","PTGDS","ASPN","POSTN","GPC3","TNN","SFRP1","RGS5","MYL9","TPM2","RERGL","PECAM1","SELE","CLDN5","VWF","PROX1","LYVE1","HLA-DRA","FCER1G","TYROBP","AIF1","CD1C","CD207","CD68","RNASE1","ITGAX","CD3D","CD3E","CD52","IL7R","DCT","MLANA","PMEL","CDH19","NGFR","UBE2C","PCNA")

sce.patient1_HS <- sc_PCA(sce.patient1_HS, hvg.patient1_HS, known_markers)
## Adding IL7R to the gene set used for PCA in Patient1 HS

sce.patient1_SCC <- sc_PCA(sce.patient1_SCC, hvg.patient1_SCC, known_markers)
## Adding KRT10 APOE TNN AIF1 CD3E NGFR to the gene set used for PCA in Patient1 SCC

sce.patient2_HS <- sc_PCA(sce.patient2_HS, hvg.patient2_HS, known_markers)
## Adding COL17A1 FCER1G AIF1 CD207 ITGAX CD3E to the gene set used for PCA in Patient2 HS

sce.patient2_AK <- sc_PCA(sce.patient2_AK, hvg.patient2_AK, known_markers)
## Adding GPC3 TNN FCER1G ITGAX CD3E to the gene set used for PCA in Patient2 AK

Clustering

results <- sc_cluster(sce.patient1_HS)

sce.patient1_HS <- results[[1]]
g.patient1_HS <- results[[2]]

results <- sc_cluster(sce.patient1_SCC)

sce.patient1_SCC <- results[[1]]
g.patient1_SCC <- results[[2]]

results <- sc_cluster(sce.patient2_HS)

sce.patient2_HS <- results[[1]]
g.patient2_HS <- results[[2]]

results <- sc_cluster(sce.patient2_AK)

sce.patient2_AK <- results[[1]]
g.patient2_AK <- results[[2]]

Cluster annotation

Known markers

# genes not found in our data : "DCDN" "WISP2" "SEPP1" "TYP1" 
# were removed from the list below
genes <- list(
  # classification: https://www.sciencedirect.com/science/article/pii/S0923181120301985?via%3Dihub#bib0050
  # KERATINOCYTE
  keratinocyte = c("DSC3","DSP","LGALS7"),
  #keratinocyte_basal = c("KRT5","KRT14","COL17A1"),
  #keratinocyte_suprabasal = c("KRT1","KRT10"),
  #keratinocyte_differentiated = c("LOR", "SPINK5"),
  #keratinocyte_ORS = c("KRT6B"),
  #keratinocyte_channel = c("GJB2","GJB6","ATP1B1"),
  #keratinocyte_sebaceous_gland = c("MGST1","FASN"),
  #keratinocyte_sweat_gland  = c("DCD","AQP5"),
  # FIBROBLAST
  fibroblast  = c("DCN","LUM","MMP2"),
  # fibroblast subclassification  from: https://www.nature.com/articles/s42003-020-0922-4#
  #fibroblast_secretory_reticular = c(SLPI","CTHRC1","MFAP5","TSPAN8"),
  #fibroblast_proinflammatory = c("CCL19","APOE","CXCL2","CXCL3","EFEMP1"),
  #fibroblast_secretory_papillary = c("APCDD1","ID1","WIF1","COL18A1","PTGDS"),
  #fibroblast_mesenchymal = c("ASPN","POSTN","GPC3","TNN","SFRP1"),
  # PERICYTE & vSMC
  pericytevSMC  = c("ACTA2","TAGLN"), #vSMC : vascular smooth muscel cell
  #pericytevSMC_pericyte  = c("RGS5"),
  #pericytevSMC_vSMC  = c("MYL9","TPM2","RERGL"),
  # ENDOTHELIAL CELL
  endothelial  = c("PECAM1","SELE","CLDN5","VWF"),
  #endothelial_lymphatic  = c("PROX1","LYVE1"),
  # MYELOID CELL
  myeloid  = c("HLA-DRA","FCER1G","TYROBP","AIF1"),
  #myeloid_dendritic  = c("CD1C"),
  #myeloid_langerhans  = c("CD207"),
  #myeloid_macrophage = c("CD68","RNASE1","ITGAX"),
  # LYMPHOCYTE
  lymphocyte  = c("CD3D","CD3E","CD52","IL7R"),
  # MELANOCYTE
  melanocyte  = c("DCT","MLANA","PMEL"),
  # SCHWANN CELL
  schwann  = c("CDH19","NGFR"),
  # MITOTIC CELL
  mitotic  = c("UBE2C","PCNA")
)

# TODO: try to move the convert_rownames step higher in a block without output so we're less
# likely to run it multiple time (which would lead to weird output)
#rownames(sce.patient1_HS) <- convert_rownames(sce.patient1_HS, g.patient1_HS, genes)
results <- annotate_cells(sce.patient1_HS, g.patient1_HS, genes)
sce.patient1_HS <- results[[1]]
km.patient1_HS <- results[[2]]

#rownames(sce.patient1_SCC) <- convert_rownames(sce.patient1_SCC, g.patient1_SCC, genes)
results <- annotate_cells(sce.patient1_SCC, g.patient1_SCC, genes)
sce.patient1_SCC <- results[[1]]
km.patient1_SCC <- results[[2]]

#rownames(sce.patient2_HS) <- convert_rownames(sce.patient2_HS, g.patient2_HS, genes)
results <- annotate_cells(sce.patient2_HS, g.patient2_HS, genes)
sce.patient2_HS <- results[[1]]
km.patient2_HS <- results[[2]]

#rownames(sce.patient2_AK) <- convert_rownames(sce.patient2_AK, g.patient2_AK, genes)
results <- annotate_cells(sce.patient2_AK, g.patient2_AK, genes)
sce.patient2_AK <- results[[1]]
km.patient2_AK <- results[[2]]

Pseudo-bulk aggregation

#Patient 1 HS
#km.patient1_HS
#km.patient1_SCC
#rowData(sce.patient1_HS[c("keratinocyte", "fibroblast", "endothelial", "myeloid", )])$Symbol

results <- pseudobulk(sce.patient1_HS, km.patient1_HS)

sce.patient1_HS <- results[[1]]
pb.patient1_HS <- results[[2]]
mat.patient1_HS <- results[[3]]
cl2.patient1_HS <- results[[4]]

#Patient 1 SCC
results <- pseudobulk(sce.patient1_SCC, km.patient1_SCC)

sce.patient1_SCC <- results[[1]]
pb.patient1_SCC <- results[[2]]
mat.patient1_SCC <- results[[3]]
cl2.patient1_SCC <- results[[4]]

#Patient 2 HS
results <- pseudobulk(sce.patient2_HS, km.patient2_HS)

sce.patient2_HS <- results[[1]]
pb.patient2_HS <- results[[2]]
mat.patient2_HS <- results[[3]]
cl2.patient2_HS <- results[[4]]

#Patient 2 AK
results <- pseudobulk(sce.patient2_AK, km.patient2_AK)

sce.patient2_AK <- results[[1]]
pb.patient2_AK <- results[[2]]
mat.patient2_AK <- results[[3]]
cl2.patient2_AK <- results[[4]]

# create barplots of cell type proportions
df.celltypes.patient1 <- rbind(df_barplot_celltypes(sce.patient1_HS), df_barplot_celltypes(sce.patient1_SCC))
df.celltypes.patient1$Type <- c("HS", "HS", "HS", "HS", "HS", "HS", "HS", "HS", "SCC", "SCC", "SCC", "SCC", "SCC", "SCC", "SCC", "SCC")

df.celltypes.patient2 <- rbind(df_barplot_celltypes(sce.patient2_HS), df_barplot_celltypes(sce.patient2_AK))
df.celltypes.patient2$Type <- c("HS", "HS", "HS", "HS", "HS", "HS", "HS", "HS", "AK", "AK", "AK", "AK", "AK", "AK", "AK", "AK")
  
dynamic_barplot(df.celltypes.patient1, "Patient 1", T, "Proportion of cell types in", c(HS ='#999999', SCC = '#E69F00'))

dynamic_barplot(df.celltypes.patient2, "Patient 2", T, "Proportion of cell types in", c(HS ='#999999', AK = '#E69F00'))

##are those results significant or could they have happened by chance?
fisher_test_subtypes(sce.patient1_HS, sce.patient1_SCC, 'keratinocyte')
## 
##  Fisher's Exact Test for Count Data
## 
## data:  subtype_matrix
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  1.882766 2.415495
## sample estimates:
## odds ratio 
##   2.131753
fisher_test_subtypes(sce.patient1_HS, sce.patient1_SCC, 'lymphocyte')
## 
##  Fisher's Exact Test for Count Data
## 
## data:  subtype_matrix
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.2368988 0.3284277
## sample estimates:
## odds ratio 
##  0.2791748
fisher_test_subtypes(sce.patient1_HS, sce.patient1_SCC, 'fibroblast')
## 
##  Fisher's Exact Test for Count Data
## 
## data:  subtype_matrix
## p-value = 2.8e-08
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.3948515 0.6476806
## sample estimates:
## odds ratio 
##  0.5061342

Sub population

known_markers.kera <- c("KRT5","KRT14","COL17A1","KRT1","KRT10","LOR", "SPINK5","KRT6B","GJB2","GJB6","ATP1B1","MGST1","FASN","DCD","AQP5")

#annotate the subclusters
  genes.kera <- list(
  # classification: https://www.sciencedirect.com/science/article/pii/S0923181120301985?via%3Dihub#bib0050
    # KERATINOCYTE
    #keratinocyte = c("DSC3","DSP","LGALS7"),
    basal = c("KRT5","KRT14","COL17A1"),
    suprabasal = c("KRT1","KRT10"),
    differentiated = c("LOR", "SPINK5"),
    ORS = c("KRT6B"),
    channel = c("GJB2","GJB6","ATP1B1"),
    sebaceous_gland = c("MGST1","FASN"),
    sweat_gland  = c("DCD","AQP5")
    )
  
  
# TODO : find a way to avoid giving both known_merkers.kera and genes.kera since both contain the 
  # same information in essence. Maybe will be fixed by the first todo
  # yeah but then we are running an additional for loop/maybe we want different ones (hypothetically at least)
sce.patient1_HS.kera <- cluster_subtype(sce.patient1_HS, "keratinocyte", known_markers.kera, genes.kera)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |======================================================================| 100%
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |========                                                              |  12%
  |                                                                            
  |==========                                                            |  15%
  |                                                                            
  |============                                                          |  18%
  |                                                                            
  |==============                                                        |  21%
  |                                                                            
  |================                                                      |  24%
  |                                                                            
  |===================                                                   |  26%
  |                                                                            
  |=====================                                                 |  29%
  |                                                                            
  |=======================                                               |  32%
  |                                                                            
  |=========================                                             |  35%
  |                                                                            
  |===========================                                           |  38%
  |                                                                            
  |=============================                                         |  41%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |=========================================                             |  59%
  |                                                                            
  |===========================================                           |  62%
  |                                                                            
  |=============================================                         |  65%
  |                                                                            
  |===============================================                       |  68%
  |                                                                            
  |=================================================                     |  71%
  |                                                                            
  |===================================================                   |  74%
  |                                                                            
  |======================================================                |  76%
  |                                                                            
  |========================================================              |  79%
  |                                                                            
  |==========================================================            |  82%
  |                                                                            
  |============================================================          |  85%
  |                                                                            
  |==============================================================        |  88%
  |                                                                            
  |================================================================      |  91%
  |                                                                            
  |==================================================================    |  94%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |======================================================================| 100%
## Adding AQP5 to the gene set used for PCA in Patient1 HS

sce.patient1_SCC.kera <- cluster_subtype(sce.patient1_SCC, "keratinocyte", known_markers.kera, genes.kera)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |======================================================================| 100%
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=======                                                               |   9%
  |                                                                            
  |=========                                                             |  12%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |=============                                                         |  19%
  |                                                                            
  |===============                                                       |  22%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |====================                                                  |  28%
  |                                                                            
  |======================                                                |  31%
  |                                                                            
  |========================                                              |  34%
  |                                                                            
  |==========================                                            |  38%
  |                                                                            
  |============================                                          |  41%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |==========================================                            |  59%
  |                                                                            
  |============================================                          |  62%
  |                                                                            
  |==============================================                        |  66%
  |                                                                            
  |================================================                      |  69%
  |                                                                            
  |==================================================                    |  72%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |=======================================================               |  78%
  |                                                                            
  |=========================================================             |  81%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |=============================================================         |  88%
  |                                                                            
  |===============================================================       |  91%
  |                                                                            
  |==================================================================    |  94%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |======================================================================| 100%
## Adding FASN to the gene set used for PCA in Patient1 SCC

sce.patient2_HS.kera <- cluster_subtype(sce.patient2_HS, "keratinocyte", known_markers.kera, genes.kera)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |======================================================================| 100%
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |========                                                              |  12%
  |                                                                            
  |==========                                                            |  15%
  |                                                                            
  |============                                                          |  18%
  |                                                                            
  |==============                                                        |  21%
  |                                                                            
  |================                                                      |  24%
  |                                                                            
  |===================                                                   |  26%
  |                                                                            
  |=====================                                                 |  29%
  |                                                                            
  |=======================                                               |  32%
  |                                                                            
  |=========================                                             |  35%
  |                                                                            
  |===========================                                           |  38%
  |                                                                            
  |=============================                                         |  41%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |=========================================                             |  59%
  |                                                                            
  |===========================================                           |  62%
  |                                                                            
  |=============================================                         |  65%
  |                                                                            
  |===============================================                       |  68%
  |                                                                            
  |=================================================                     |  71%
  |                                                                            
  |===================================================                   |  74%
  |                                                                            
  |======================================================                |  76%
  |                                                                            
  |========================================================              |  79%
  |                                                                            
  |==========================================================            |  82%
  |                                                                            
  |============================================================          |  85%
  |                                                                            
  |==============================================================        |  88%
  |                                                                            
  |================================================================      |  91%
  |                                                                            
  |==================================================================    |  94%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |======================================================================| 100%
## Adding FASN to the gene set used for PCA in Patient2 HS

sce.patient2_AK.kera <- cluster_subtype(sce.patient2_AK, "keratinocyte", known_markers.kera, genes.kera)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |======================================================================| 100%
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=======                                                               |   9%
  |                                                                            
  |=========                                                             |  12%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |=============                                                         |  19%
  |                                                                            
  |===============                                                       |  22%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |====================                                                  |  28%
  |                                                                            
  |======================                                                |  31%
  |                                                                            
  |========================                                              |  34%
  |                                                                            
  |==========================                                            |  38%
  |                                                                            
  |============================                                          |  41%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |==========================================                            |  59%
  |                                                                            
  |============================================                          |  62%
  |                                                                            
  |==============================================                        |  66%
  |                                                                            
  |================================================                      |  69%
  |                                                                            
  |==================================================                    |  72%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |=======================================================               |  78%
  |                                                                            
  |=========================================================             |  81%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |=============================================================         |  88%
  |                                                                            
  |===============================================================       |  91%
  |                                                                            
  |==================================================================    |  94%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |======================================================================| 100%
## Adding KRT5 to the gene set used for PCA in Patient2 AK

genes.dyn <- list(
  # classification: https://www.sciencedirect.com/science/article/pii/S0092867420306723
  keratinocyte_basal = "COL17A1",
  keratinocyte_cycling = "MKI67",
  keratinocyte_differentiating = "KRT1"
  )


# dynamic_plot(sce.patient1_HS.kera, g.patient1_HS.kera, genes.dyn)
# dynamic_plot(sce.patient1_SCC.kera, g.patient1_SCC.kera, genes.dyn)
# dynamic_plot(sce.patient2_HS.kera, g.patient2_HS.kera, genes.dyn)
# dynamic_plot(sce.patient2_AK.kera, g.patient2_AK.kera, genes.dyn)

## lelo_new plot

kera.dyn.patient1_HS <- dynamic_plot(sce.patient1_HS.kera, g.patient1_HS.kera, genes.dyn)
## class: SingleCellExperiment 
## dim: 16970 1262 
## metadata(2): Samples scDblFinder.stats
## assays(2): counts logcounts
## rownames(16970): ENSG00000241860.AL627309.5 ENSG00000237491.LINC01409
##   ... ENSG00000278817.AC007325.4 ENSG00000277196.AC007325.2
## rowData names(4): ID Symbol Type scDblFinder.selected
## colnames(1262): AAACCTGAGGGCTCTC-1 AAACCTGCATAGAAAC-1 ...
##   TTTGTCACAATCGGTT-1 TTTGTCAGTTCACCTC-1
## colData names(15): Sample Barcode ... cluster2 cluster3
## reducedDimNames(3): PCA TSNE UMAP
## altExpNames(1): Antibody Capture
kera.dyn.patient1_SCC <- dynamic_plot(sce.patient1_SCC.kera, g.patient1_SCC.kera, genes.dyn)
## class: SingleCellExperiment 
## dim: 15718 504 
## metadata(2): Samples scDblFinder.stats
## assays(2): counts logcounts
## rownames(15718): ENSG00000237491.LINC01409 ENSG00000228794.LINC01128
##   ... ENSG00000277666.AC136352.3 ENSG00000273554.AC136616.1
## rowData names(4): ID Symbol Type scDblFinder.selected
## colnames(504): AAACCTGGTAGATTAG-1 AAACGGGGTCCGTGAC-1 ...
##   TTTGCGCGTCGGCATC-1 TTTGGTTAGTCATGCT-1
## colData names(15): Sample Barcode ... cluster2 cluster3
## reducedDimNames(3): PCA TSNE UMAP
## altExpNames(1): Antibody Capture
kera.dyn.patient2_HS <- dynamic_plot(sce.patient2_HS.kera, g.patient2_HS.kera, genes.dyn)
## class: SingleCellExperiment 
## dim: 16987 2468 
## metadata(2): Samples scDblFinder.stats
## assays(2): counts logcounts
## rownames(16987): ENSG00000237491.LINC01409 ENSG00000228794.LINC01128
##   ... ENSG00000278817.AC007325.4 ENSG00000277196.AC007325.2
## rowData names(4): ID Symbol Type scDblFinder.selected
## colnames(2468): AAACCTGGTCAAACTC-1 AAACGGGAGGAGCGAG-1 ...
##   TTTGTCATCAGTCAGT-1 TTTGTCATCTATCCCG-1
## colData names(15): Sample Barcode ... cluster2 cluster3
## reducedDimNames(3): PCA TSNE UMAP
## altExpNames(1): Antibody Capture
kera.dyn.patient2_AK <- dynamic_plot(sce.patient2_AK.kera, g.patient2_AK.kera, genes.dyn)
## class: SingleCellExperiment 
## dim: 15646 445 
## metadata(2): Samples scDblFinder.stats
## assays(2): counts logcounts
## rownames(15646): ENSG00000241860.AL627309.5 ENSG00000237491.LINC01409
##   ... ENSG00000273554.AC136616.1 ENSG00000277196.AC007325.2
## rowData names(4): ID Symbol Type scDblFinder.selected
## colnames(445): AAAGTAGCACCGCTAG-1 AAATGCCTCGGAATCT-1 ...
##   TTTGGTTCATGAACCT-1 TTTGGTTGTGGTAACG-1
## colData names(15): Sample Barcode ... cluster2 cluster3
## reducedDimNames(3): PCA TSNE UMAP
## altExpNames(1): Antibody Capture
kera.dyn.patient1 <- rbind(kera.dyn.patient1_HS, kera.dyn.patient1_SCC)
kera.dyn.patient1$Type <- c("HS", "HS", "HS", "SCC", "SCC", "SCC")

kera.dyn.patient2 <- rbind(kera.dyn.patient2_HS, kera.dyn.patient2_AK)
kera.dyn.patient2$Type <- c("HS", "HS", "HS", "AK", "AK", "AK")


dynamic_barplot(kera.dyn.patient1, "Patient 1", F, "Proportion of Keratinocyte states in", c(HS ='#999999', SCC = '#E69F00'))

dynamic_barplot(kera.dyn.patient2, "Patient 2", F, "Proportion of Keratinocyte states in", c(HS ='#999999', AK = '#E69F00'))

known_markers.fibro <- c("SLPI","CTHRC1","MFAP5","TSPAN8","CCL19","APOE","CXCL2","CXCL3","EFEMP1","APCDD1","ID1","WIF1","COL18A1","PTGDS","ASPN","POSTN","GPC3","TNN","SFRP1")

genes.fibro <- list(
   #fibroblast  = c("DCN","LUM","MMP2"),
#   # fibroblast subclassification  from: https://www.nature.com/articles/s42003-020-0922-4#
   fibroblast_secretory_reticular = c("SLPI","CTHRC1","MFAP5","TSPAN8"),
   fibroblast_proinflammatory = c("CCL19","APOE","CXCL2","CXCL3","EFEMP1"),
   fibroblast_secretory_papillary = c("APCDD1","ID1","WIF1","COL18A1","PTGDS"),
   fibroblast_mesenchymal = c("ASPN","POSTN","GPC3","TNN","SFRP1")
)

sce.patient1_HS.fibro <- cluster_subtype(sce.patient1_HS, "fibroblast", known_markers.fibro, genes.fibro)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |======================================================================| 100%
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |===============                                                       |  22%
  |                                                                            
  |==================                                                    |  26%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |========================                                              |  35%
  |                                                                            
  |===========================                                           |  39%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |=================================                                     |  48%
  |                                                                            
  |=====================================                                 |  52%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |===========================================                           |  61%
  |                                                                            
  |==============================================                        |  65%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |====================================================                  |  74%
  |                                                                            
  |=======================================================               |  78%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |================================================================      |  91%
  |                                                                            
  |===================================================================   |  96%
  |                                                                            
  |======================================================================| 100%
## Adding WIF1 to the gene set used for PCA in Patient1 HS

sce.patient1_SCC.fibro <- cluster_subtype(sce.patient1_SCC, "fibroblast", known_markers.fibro, genes.fibro)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |======================================================================| 100%
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |========                                                              |  11%
  |                                                                            
  |==========                                                            |  15%
  |                                                                            
  |=============                                                         |  19%
  |                                                                            
  |================                                                      |  22%
  |                                                                            
  |==================                                                    |  26%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |==========================                                            |  37%
  |                                                                            
  |=============================                                         |  41%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |==================================                                    |  48%
  |                                                                            
  |====================================                                  |  52%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |=========================================                             |  59%
  |                                                                            
  |============================================                          |  63%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |====================================================                  |  74%
  |                                                                            
  |======================================================                |  78%
  |                                                                            
  |=========================================================             |  81%
  |                                                                            
  |============================================================          |  85%
  |                                                                            
  |==============================================================        |  89%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |===================================================================   |  96%
  |                                                                            
  |======================================================================| 100%
## Adding CCL19 to the gene set used for PCA in Patient1 SCC

sce.patient2_HS.fibro <- cluster_subtype(sce.patient2_HS, "fibroblast", known_markers.fibro, genes.fibro)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |======================================================================| 100%
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==                                                                    |   4%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |========                                                              |  11%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |============                                                          |  18%
  |                                                                            
  |===============                                                       |  21%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |======================                                                |  32%
  |                                                                            
  |=========================                                             |  36%
  |                                                                            
  |============================                                          |  39%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |================================                                      |  46%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |======================================                                |  54%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |==========================================                            |  61%
  |                                                                            
  |=============================================                         |  64%
  |                                                                            
  |================================================                      |  68%
  |                                                                            
  |==================================================                    |  71%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |=======================================================               |  79%
  |                                                                            
  |==========================================================            |  82%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |==============================================================        |  89%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |====================================================================  |  96%
  |                                                                            
  |======================================================================| 100%

sce.patient2_AK.fibro <- cluster_subtype(sce.patient2_AK, "fibroblast", known_markers.fibro, genes.fibro)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |======================================================================| 100%
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |========                                                              |  11%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |================                                                      |  22%
  |                                                                            
  |===================                                                   |  28%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |===========================                                           |  39%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |===========================================                           |  61%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |===================================================                   |  72%
  |                                                                            
  |======================================================                |  78%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |==============================================================        |  89%
  |                                                                            
  |==================================================================    |  94%
  |                                                                            
  |======================================================================| 100%

# The following code was used to check which genes were missing from in the data
# the known_markers are from the litterature

data_markers <- rowData(sce.patient1_HS)$Symbol
known_markers <- c("DSC3","DSP","LGALS7","KRT5","KRT14","COL17A1","KRT1","KRT10","LOR", "SPINK5","KRT6B","GJB2","GJB6","ATP1B1","MGST1","FASN","DCD","AQP5","DCN","LUM","MMP2","WISP2","SLPI","CTHRC1","MFAP5","TSPAN8","CCL19","APOE","CXCL2","CXCL3","EFEMP1","APCDD1","ID1","WIF1","COL18A1","PTGDS","ASPN","POSTN","GPC3","TNN","SFRP1","ACTA2","TAGLN","RGS5","MYL9","TPM2","RERGL","PECAM1","SELE","CLDN5","VWF","PROX1","LYVE1","HLA-DRA","FCER1G","TYROBP","AIF1","CD1C","CD207","CD68","RNASE1","SEPP1","ITGAX","CD3D","CD3E","CD52","IL7R","DCT","MLANA","PMEL","TYP1","CDH19","NGFR","UBE2C","PCNA")


idx_notfound <- which(!(known_markers %in% rowData(sce.patient1_HS)$Symbol))

print(paste("Done:", length(known_markers)-length(idx_notfound), "/", length(known_markers), "markers found"))
cat("Genes missing in dataset: ", known_markers[idx_notfound])

Kim, Doyoung, Kyung Bae Chung, and Tae-Gyun Kim. 2020. “Application of Single-Cell Rna Sequencing on Human Skin: Technical Evolution and Challenges.” Journal of Dermatological Science 99 (2): 74–81. doi:https://doi.org/10.1016/j.jdermsci.2020.06.002.

Solé-Boldo, Raddatz, L. 2020. “Single-Cell Transcriptomes of the Human Skin Reveal Age-Related Loss of Fibroblast Priming.” Communications Biology 3 (188). doi:https://doi.org/10.1038/s42003-020-0922-4.